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, In this paper, we study the asymptotic posterior distribution of hnear functionals of 

the density. In particular, we give general conditions to obtain a semiparametric version 
, of the Bernstein- Von Mises theorem. We then apply this general result to nonparametric 

priors based on infinite dimensional exponential families. As a byproduct, wc also derive 
adaptive nonparametric rates of concentration of the posterior distributions under these 
families of priors on the class of Sobolev and Besov spaces. 



OO 

o 

0^ 



>< 



Abstract 



H 

■ Keywords Adaptive estimation, Bayesian nonparametric, Bernstein Von Mises, Rates of 
', convergence. Wavelet 

^ Mathematics Subject Classification (2000) 62G20 62F15 

^ . 1 Introduction 

l> . 

\^ , The Bernstein- Von Mises property, in Bayesian analysis, concerns the asymptotic form of 

the posterior distribution of a quantity of interest, and more specifically it corresponds to 
the asymptotic normality of the posterior distribution centered at some kind of maximum 
likelihood estimator with variance being equal to the asymptotic frequentist variance of the 
centering point. Such results are well know in parametric frameworks, see for instance 

0) 

■ where general conditions are given. This is an important property for both practical and 
theoretical reasons. In particular the asymptotic normality of the posterior distributions 
allows us to construct approximate credible regions and the duality between the behaviour of 



' the posterior distribution and the frequentist distribution of the asymptotic centering point 



of the posterior implies that credible regions will have also good frequentist properties. These 



results are given in many Bayesian textbooks see for instance (jlTI ) or yj). 

In a frequentist perspective the Bernstein- Von Mises property enables the construction of 
confidence regions since under this property a Bayesian credible region will be asymptotically 
a frequentist confidence region as well. This is even more important in complex models, since 
in such models the construction of confidence regions can be difficult whereas, the Markov 
Chain Monte Carlo algorithms usually make the construction of a Bayesian credible region 
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feasible. However the more complex the model the harder it is to derive Bernstein - Von 
Mises theorems. In infinite dimensional setups, the mechanisms are even more complex. 

Semi-parametric and non parametric models are widely popular both from a theoretical 
and practical perspective and have been used by frequentists as well as Bayesians although 
their theoretical asymptotic properties have been mainly studied in the frequentist literature. 
The use of Bayesian non parametric or semi-parametric approaches is more recent and has 
been made possible mainly by the development of algorithms such as Markov Chain Monte- 
Carlo algorithms but has grown rapidly over the past decade. 

However, there is still little work on asymptotic properties of Bayesian procedures in 
semi-parametric models or even in nonparametric models. Most of existing works on the 
asymptotic posterior distributions deal with consistency or rates of concentration of the pos- 
terior. In other words it consists in controlling objects in the form P'^ [J/^lX*^] where P'^[.|X'^] 
denotes the posterior distribution given a n vector of observations^" and Un denotes either 
a fixed neighbourhood (consistency) or a sequence of shrinking neighbourhoods (rates of con- 
centration). As remarked by ([^) consistency is an important condition since it is not possible 
to construct subjective prior in a nonparametric framework. Obtaining concentration rates 
of the posterior helps in understanding the impact of the choice of a specific prior and al- 
lows for a comparison between priors to some extent. However, to obtain a Bernstein- Von 
Mises theorem it is necessary not only to bound P'^ [i/nlX"'] but to determine an equivalent 
of P'^ [C/nlX"] for some specific types of sets Un- This difficulty explains that there is up to 
now very little work on Bernstein Von Mises theorems in infinite dimensional models. The 
most well known results are negative results and are given in (0). Some positiv e e results are 
provided by (0) on the asymptotic normality of the posterior distribution of the parameter in 
an exponential family with increasing number of parameters. In a discrete setting (0) derive 
Bernstein- Von Mises results, in particular satisfied by Dirichlet priors. Nice positive results 
are obtained in (IJ) and (jla). however they rely heavily on a conjugacy type of property of 
the family of priors they consider and on the fact that their priors put mass one on discrete 
probabilities which makes the comparison with the empirical distribution more tractable. 

In a semi-parametric framework, where the parameter can be separated into a parametric 
part, which is the parameter of interest and a non parametric part, which is the nuisance 
parameter, (0) obtains interesting conditions leading to a Bernstein - Von Mises theorem on 
the parametric part, clarifying an earlier work of (jlSl ). 

In this paper we are interested in studying the existence of a Bernstein- Von Mises property 
in semi-parametric models where the parameter of interest is a functional of the nuisance 
parameter, which is the density of the observations. The estimation of functionals of infinite 
dimensional parameters such as the cumulative distribution function at a specific point, is 
a widely studied problem both in the frequentist literature and in the Bayesian literature. 
There is a vast literature on the rates of convergence and on the asymptotic distribution of 
frequentist estimates of functionals of unknown curves and of finite dimensional functionals 
of curves in particular, see for instance (j2ll ) for an excellent presentation of a general theory 
on such problems. 

One of the most common functional considered in the literature is the cumulative distri- 
bution function calculated at a given point, say F{x). The empirical cumulative distribution 
function, Fn{x) is a natural frequentist estimator and its asymptotic distribution is Gaussian 
with mean F(x) and variance F{x)(l — F{x))/n. 

The Bayesian counterpart of this estimator is the one derived from a Dirichlet process 
prior and it is well known to be asymptotically equivalent to Fn{x), see for instance (llOh.This 
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result is obtained using the conjugate nature of the Dirichlet prior, leading to an explicit 
posterior distribution. Other frequentist estimators, based on frequentist estimates of the 
density have also been studied in the frequentist literature, in particular estimates based on 
kernel estimators. Hence a natural question arises. Can we generalize the Bernstein - Von 
Mises theorem of the Dirichlet estimator to other Bayesian estimators? What happens if the 
prior has support on distributions absolutely continuous with respect to Lebesgue Measure? 

In this paper we provide an answer to these questions by establishing conditions under 
which a Bernstein- Von Mises theorem can be obtained for linear functional of the density of /, 
such as the cumulative distribution function -F(x), with centering its empirical counterpart, 
for instance Fn{x) the empirical cumulative distribution function, when the prior puts positive 
mass on absolutely continuous densities with respect to Lebesgue measures. We also study 
cases where the asymptotic posterior distribution of the functional is not asymptotically 
Gaussian but is asymptotically a mixture of Gaussian distributions with different centering 
points. 



1.1 Notations and aim 

In this paper, we assume that given a distribution P with a compactly supported density / 
with respect to the Lebesgue measure, Xi, ...,Xn are independent and identically distributed 
by F. We set = {Xi, Xn) and denote F the cumulative distribution function associated 
with /. Without loss of generality we assume that for any i, Xi G [0, 1] and we set 



|/: [0,1] f{x)dx = l^ 



We now define other notations that will be used throughout the paper. Denote Inif) the 
log-likelihood associated with the density / and if it is parametrized by a finite dimensional 
parameter 9, ln{0) = ln{fe)- For an integrable function we sometimes use the notation 
F{g) = Jq f{u)g{u)du. We denote by < ., . >j the inner product in 



U{F) = : j g\x)f{x)dx < +cx)| 



and by the corresponding norm. 

We also consider the inner product in L2[0, 1] denoted < ., . >2 and ||.||2 the corresponding 
norm. When there is no ambiguity we note < ., . >j„ by < ., . > and \\-\\fo by ||.||. 

Let K(f, f) and h{f, /') respectively the Kullback-Leibler divergence and the Hellinger 
distance between two densities / and /', where we recall that 



Vm-v^Ydx 



1/2 



K(/,/')=F(log(///')), Hf,f') = 
and define 

y(/,/') = F((iog(///'))'). 

Finally, let Pq the true distribution of the observations Xi. /o is the associated density and 
Fq the associated cumulative distribution function. We consider the usual notations on the 
empirical process, namely 

^ n ^ n 

Pn{9) = - 5^9(^0, Gn{g) = Y.^g{X,) - Fo(5)], 
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and Ffi the empirical distribution function. 

Consider a prior 11 on the set J^. The aim of this paper is to study the posterior distribution 
of ^(/), where ^' is a continuous hnear form on L2[0, 1] (a typical example is ^{f) = F{xq) = 
¥[X < xq] for xq € M) and to derive conditions under which 

P'^ [v^l^-C/) - ^'(-Pn)) < zlX""] <^Vo{z) in Po Probability, 

where Vq is the variance of y/n^(Pn) under Pq and for any V, ^vi^) is the cumulative 
distribution function of a Gaussian random variable centered at with variance V. 



1.2 Organization of the paper 

In Section [2] we present the general Bernstein Von Mises theorem, which is given in the formal 
way in the case where linear submodels are adapted to the prior. We then apply, in Section [3l 
this general theorem to the case where the prior is based on infinite dimensional exponential 
families. In this section, we first give general results giving the asymptotic posterior distri- 
bution of ^(/) which can be either Gaussian or a mixture of Gaussian distributions. We also 
provide a theorem describing the posterior concentration rate under such priors (see Section 
13. 2p . Finally, in Section f3.4l using an example, we explain how bad phenomenons can occur. 
The proofs are postponed in Section [H 



2 Bernstein Von Mises theorems 

2.1 Some heuristics for proving Bernstein Von Mises theorems 

We first define some notions that are useful in the study of asymptotic properties of semi 



parametric models. These notions can be found for instance in (|2ll ). 




As in Chapter 25 of ()2ll ). to study the asymptotic behaviour of semi-parametric models 
we consider 1-dimensional diff'erentiable paths locally around the true parameter /o, that is 
submodels of the form: u ^ fu iov < u < uq, for some uq > such that for each path there 
exists a measurable function g called the score function for the submodel ,0 < u < uq} 
at ti = satisfying 

1/2 \ ^ 

^^^'-9{x)f',^\x)] dx = 0. (2.1) 

We denote by J^f^ the tangent set, i.e. the collection of score functions g associated with 
these differentiable paths. Using ()2.ip . J^f^ can be identified with a subset of {g G L2(Fo) : 
Fg^g) = 0}. For instance, when considering all probability laws, the most usual collection of 
differentiable paths is given by 

/.(x)=c(t)/o(x)e«^(^) (2.2) 

with ||g||oo < oo and c such that c(0) = 1 and c'(0) = 0. In this case, g is the score 
function. Note that as explained in (j2ll ). the collection of differentiable paths of the form 
fu{x) = 2c{u)fo{x){l + ex.p{—2ug{x)))~^ (with previous conditions on c), leads to the tangent 
space given by {g G L2(Fo) : Fo{g) = 0}. 
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Now, consider a continuous linear form ^' on L2. We can identify such a functional by a 
function ^ G L2 such that for all / G L2 

^(/) = I f{x)i;{x)dx. (2.3) 

Then for any differentiable path t ^ ft with score function (7, if the function ^ is bounded 
on M (or on the support of for all < u < uq). 



, {fl'\x) - fl'\x))' 
ip{x)g{x)fo{x)dx-\- / —'ip{x)dx 



+2 / ^(x) _ 1,(.)/V^(,)^ /V^(,),, 

= < -0,9 > +0(1). 

Then, we can define the efficient influence function ^ belonging to lm{J^fg) (the closure of the 
linear space generated by that satisfies for any g G 

^ i>{u)gix)fo{x)dx = j 'ip{x)g{x)fo{x)dx. 
This implies: 

l.^nM^l(M=<^,,>. (2.4) 

The efficient influence function will play an important role for our purpose. The efficient 
influence function is also a key notion to characterize asymptotically efficient estimators (see 
Section 25.3 of Jil!)). 

Now, let us provide some examples by specifying different types of continuous linear forms 
that can be considered. 

Example 2.1. An important example is provided by the cumulative distribution function. If 
xo G M is fixed, consider for any density function / G L2 whose cdf is F , 



*(/) = j ilx<xj{x)dx = F{xo) 



so that in this case, '<p{u) = UxKzq; which is a bounded function and if J'Jq is the subspace of 
1-2(^0) of functions g satisfying Fo{g) = then ^^(x) = 'HxKxo - Fo{xo). 

Example 2.2. More generally, for any measurable set A consider il^ix) = U^eA and for any 
density function / G L2 

*(/) = j ^x^Af{u)du 

satisfies the above conditions and ip{x) = H^gA — Jjifo{x)dx. 

Example 2.3. If fo has bounded support, say on [0, 1] then the functional 

^{f)=Ef[X] = C xf{x)dx 
Jo 

satisfies the above conditions, ip{x) = x and ip{x) = x — Ejy[X]. 
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In this framework, the Bernstein Von Mises theorem could be derived from the convergence 
of the following Laplace transform defined for any t G R by 

Ln{t) = E-[exp(tV^(^'(/)-^'(P„)))|X"] 

J exp - ^{Pn)) + Uf) - Inifo)) dTTjf) 

JexpilM)-lnifo))dn{f) 
Now, let us set fg^n = fu '^^ u = n'^^ . We have: 

^{^Ua,n) - ^(^n)) = / i'{x){fg,n{x) " Ux))dx - 



Furthermore, 



with 



ln{fg,n)-Uh) = Rn{g) + Gn{g) - ^"^^^^ 



2 



Rr,{9) = nPn ( log i^]] - Gn{g) + ^ 



fo 
So, 

,n) 

F fo^) 

= Rn{g) - + Gn{g - ti>) + tAnig) +t<^,g> 

= Rn{g - tip) + Gn{g - tip) 1 h Un, 

with 

Un = tAnig) + Rn{g) - Rn{g - tip). 

Lemma 25.14 of (jiH) shows that under ([23]), Rn{g) = o(l) and ([23]) yields A^ig) = o(l) for 
a fixed g. It is not enough however to derive a Bernstein- Von Mises theorem. Nonetheless if 
we can choose a prior distribution vr adapted to the previous framework to obtain uniformly 

Un = o(l), 

{^{fg,n) -^{f)) + ln{fg,n) " Uf) = o{l) 

and the equalities 

jgfl„(g-#)+G4g-#)- """^r"^'' dvr(/) ^ /exp(/^(/)-/„(/o))d7r(4^,^) 

JgiJ„{3)+G„{3)-Sl^^^(^) /exp (/„,(/) - Inifo)) d7T{f) 

= 1 + 0(1), 



then 



L„,(t) = exp ^!^l(l + o(l)). 



In this case, our goal is reached. However, it is not obvious that a given prior vr satisfies all 
these properties. In particular, in a nonparametric framework, the property Rn{g) = o(l) 
uniformly over a set whose posterior probability goes to 1, is usually not satisfied. We thus 
consider an alternative approach based on linear submodels. 
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2.2 Bernstein Von Mises under linear submodels 



In this section we study the case where hnear local models are adapted to the prior. More 
precisely, we assume that || log(/o)||oo < oo so, for each density function /, we define h such 
that for any x, 



h{x) 



nlos 



k{x) 



or equivalently f{x) = /o(x) exp 



h{x) 



For the sake of clarity, we sometime write fh instead of / and hf instead of h to underline 
the relationship between / and h. Note that in this context h is not the score function since 
Fo(/i) / 0. It would be equivalent to consider local models of the form / = /o(l + hj^/n)^ 
except that we would have to impose constraints on h for / to be positive. We consider a 
continuous linear form ^ on L2 such that for any / G L2, we consider ^/j such that p.3p is 
satisfied and we set for any x, 

i^c{x) = - Fo(V). (2.5) 

Note that tpc coincides with the infiuence function ip associated with the tangent set {g G 
^2{Fo); FQ{g) = 0}. Then we consider the following assumptions. 

(Al) The posterior distribution concentrates around /q. More precisely, there exists Un = o(l) 
such that if A^^^ = {/ G : V{fo, /) < u^} the posterior distribution of A!^^ satisfies 

P-«|X"} = l + OPo(l). 

(A2) The posterior distribution of the subset An C j4^^ of densities such that 

fix) 



log 



satisfies 
(A3) Let 

and for any x, 



f,., {fo{x) + f{x))dx = o{l) 
fo{x)J 

¥^ [A„|X"] = l + op„(l). 



(2.6) 



Rn{h) = VT^Foih) + 



n 



V't,n(a^) = il^cix) + ^ log ( Fq 



exp 



h 



n 



tipc 



We have 



Fo{{hf-t'4>t,r,f) 

2 



+ Gn{hf - tlPt^n) + Rn{hf - t^t,n) d7r{f) 



Ia„ exp 



-^+Gnihf)+Rn{hf)]d7r{f) 



1 + 



(2.7) 
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Before stating our main result, let us discuss these assumptions. Condition (Al) concerns 
concentration rates of the posterior distribution and there exists now a large literature on 
such results. See for instance (0) or ([^) for general results. The difficulty here comes from 
the use of V instead of the Hellinger or the Li-distance. However since n„ does not need to 
be optimal, deriving rates in terms of V from those in terms of the Hellinger distance is often 
not a problem (see below). 

Condition (A2) is a refinement of (Al) but can often be derived from (Al) as illustrated 
below. 

The main difficulty comes from condition (A3). To prove it, we need to be able to construct 
a transformation T such that Tf^ = fh_f^^ exists and such that the prior is hardly modified 
by this transformation. In parametric setups, continuity of the prior near the true value 
is enough to ensure that the prior would hardly be modified by such a transform and this 
remains true in semi-parametric setups where we can write the parameter as {9, rf) where 9 
is the parameter of interest and is finite dimensional. Indeed as shown in (0) under certain 
conditions the transformations can be transferred to transformations on 9 which is finite 
dimensional. Here this is more complex since T is a transformation on / which is infinite 
dimensional so that a condition of the form dniTf) = d7r(/)(l + o(l)) does not necessarily 
make sense. We study this aspect in more details in Section [3l 

Now, we can state the main result of this section. 

Theorem 2.1. Let Jq be a density on T such that \\ log(/o)||oo < oo and \\tp\\oo < oo. Assume 
that (Al), (A2) and (A3) are true. Then, if 

^(P„) = p„(v,) = Eki^M 

n 

we have for any z, in probability with respect to Pq, 

{ V^(^'(/) - ^{Pn)) < - ^F.miz) - 0. 

The proof of Theorem 12.11 is given in Section 14.11 

Sieve priors lead to interesting behaviours of the posterior distribution as illustrated in 
the following section. Indeed they have a behaviour which is half way between parametric 
and non parametric. We illustrate these features in the following two sections. 



3 Bernstein Von Mises theorem under infinite dimensional 
exponential families 

In this section, we study a specific class of priors based on infinite dimensional exponential 
families on the following class of densities supported by [0, 1]: 

^ = |/ > : /is 1-periodic, f{x)dx = 1, log(/) G L2([0, 1])| . 

We assume that fo € J- and we consider two types of orthonormal bases defined in the 
following section, namely the Fourier and wavelet bases. 
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3.1 Orthonormal bases 

Fourier bases constitute unconditional bases of periodized Sobolev spaces where 7 is the 
smoothness parameter. Our results are also valid for a wide range of Besov spaces. In this 
case, we consider wavelet bases which allow for the following expansions: 

+00 2^-1 

f{x) = ^_ioll[o,i](a;) + ^ 0jk^jk{x), X G [0, 1] 

where 9-io = f{x)dx and Ojk = f{x)^jk{x)dx. We recall that the functions are 
obtained by periodizing dilations and translations of a mother wavelet 'J/ that can be assumed 
to be supported by the compact set [—A, A]: 

+00 

Wjk{x) = 22 ^{2^x -k + 2H), x G [0, 1]. 
(=— 00 

If 'F belongs to the Holder space C and has r vanishing moments then the wavelet ba- 
sis constitutes an unconditional basis of the Besov space Bp^q for I < p,q < +00 and 

max (^0, 1 — < 7 < r. In this case, Bp^q is the set of functions / of L2[0, 1] such that 

||/|l7,p,q < 00 where 

[ |0-io|+sup,.>o|2^(^+^F)(^2^j„i|^^.^|,^)^| .f^^^^ 

We refer the reader to (fl^ ) for a good review of wavelets and Besov spaces. We just mention 
that Besov spaces include in particular Sobolev spaces {W^ = B22) and, when 7 is not an 
integer. Holder spaces {C^ = B2o,oo)- To shorten notations, the orthonormal basis will be 
denoted ((/)a)agN) where = ll[o,i] and 

- for the Fourier basis, for A > 1, 

(p2X-iix) = \/2 sin(27rAx), (j)2x{x) = \/2 cos(27rAx). 

- for the wavelet basis, if A = 2-' + /c, with j £ N and k £ {0, ... ,2^ — 1}, 

= ^jk- 

Now, the decomposition of each periodized function / S L2[0, 1] on (i^a)agn is written as 
follows: 

f{x) = Y,^xMx), xe[o,i], 

AeN 

where 6x = f{x)(f)\{x)dx. Recall that when the Fourier basis is used, / lies in for 
7 > if and only if ||/||^ < 00, where 

V AeN* 

We respectively use ||.||-y and ||-||7,p,ij to define the radius of the balls of and Bp^q respectively. 
We now present the general result on posterior concentration rates associated with such prior 
models. 
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3.2 Posterior rates 



Assume that /o G and let (p be one of the orthonormal basis introduced in Section [3. 11 then 

log(/o) - f ^og{fo{x))dx = ^ 9oxcl)x. 
•'^ AeN* 

Set ^0 = (^OA)AeN* and define c{6q) = — log{fQ(x))dx, we have 

fo{x) = exp I ^ 6'oA<^A(a;) - 0(6*0) | . 

VAeN* / 

We consider the following family of models: for any A; G N*, we set 




= <^h = exp [y ^ Oxcpx - c{9) 

where 

c{0) 



log exp 9x^x{x)^ d:^ . (3.1) 



So, we define a prior vr on the set = Uk^k by defining a prior p on N* and then, once 
k is chosen, we fix a prior vr^ on J^^. Such priors are often considered in the Bayesian 
non parametric literature. See for instance (jl9l ). The special case of log-spline priors has 
been studied by (0) and (U), whereas the prior considered by (jH) is based on Legendre 



polynomials. For the wavelet case, (I 111 ) considered the special case of the Haar basis. 

Since one of the key conditions needed to obtain a Bernstein Von Mises theorem is a 
concentration rate of the posterior distribution of order e^, we first give two general results on 
concentration rates of posterior distributions based on the two different setups of orthonormal 
bases: the Fourier basis and the wavelet basis. These results have their own interest since 
we obtain in such contexts optimal adaptive rates of convergence. In a similar spirit I19I ) 
considers infinite dimensional exponential families and derives minimax and adaptive posterior 
concentration rates. Her work differs from the following theorem in two main aspects. Firstly 
she restricts her attention to the case of Sobolev spaces and Fourier basis, whereas we consider 
Besov spaces and secondly she obtains adaptivity by putting a prior on the smoothness of 
the Sobolev class whereas we obtain adaptivity by constructing a prior on the size k of the 
parametric spaces, which to our opinion is a more natural approach. Moreover 

(0) 

merely 

considers Gaussian priors. Also related to this problem is the work of (fill ) who derives a 
general framework to obtain adaptive posterior concentration rates and apply her results to 
the Haar basis case. The limitation in her case, apart from the fact that she considers the 
Haar basis and no other wavelet basis is that she constraints the 0j's in each k dimensional 
model to belong to a ball with fixed radius. 

Now, we specify the conditions on the prior vr: 

Definition 3.1. Let 1 > /? > 1/2 be fixed and let g be a continuous and positive density on M 
bounded (up to a contant) by the function Mp^{x) = exp (— c|2;|p*) for positive constants c,p^ 
and assume that for all M > there exists a, b such that 

g{y + u)>aexp{-b{\y\P* + \u\P-')}, V|y| < M, Vu G M 
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The prior p on k satisfies one of the following conditions: 



[Case (PH)] There exist two positive constants ci and C2 such that for any A; G N*, 

exp {-cikL{k)) < p{k) < exp {-C2kL{k)) , (3.2) 
where L is the function that can be either L{x) = 1 or L[x) = log(x). 

[Case (D)] If k^ = n^/^'^^+^\ 

p{k) = 6k*Jk). 

Conditionally on k we define the prior on by assuming that the prior distribution vr^ on 
= {Ox)i<\<k is given by 

9, Tx = tqA"^^ i.i.d. 



where (5 < l/2+p^/2 if p^ < 2 and (3 <l/2 + 1/p^ if p^ > 2. 

Observe that we do not necessarily consider Gaussian priors since we allow for densities 
g to have different tails. The prior on k can be non random, which corresponds to the Dirac 
case (D). For the case (PH), L{x) = log(x) corresponds typically to a Poisson prior on k 
and the case L{x) = 1 corresponds typically to hypergeometric priors. Now, we have the the 
following result. 

Theorem 3.1. Assume that \\ log(/o)||oo < oo and that there exists 7 > 1/2 such that log(/o) € 
Bp^q, with p > 2 and 1 < q < 00. Then, 

Lin 



fe-- M/o,/e)<^en|^n=l + op(l), (3.3) 



and 

where in case (PH), 
in case (D), 



(logn) 
L{n) 



l + op(l), (3.4) 



log n\ 27+1 



n 



0_ 

en = eo log nn 2^+1 , if 'y > (3 



e„ = eon 2/3+1, if ^ < p 

and eo is a constant large enough. 

The proof of Theorem 13.11 is given in Section 14.21 

Remark 1. // the density g only satisfies a tail condition of the form 

g{x) < Cg\x\~^*, \x\ large enough 

with p^ > 1, then, in case (PH), i/7 > 1 the rates defined by \3. 3|) and \3.4^ remain valid. 

Remark 2. Note that in the case (PH) the posterior concentration is, up to a logn term, the 
minimax rate of convergence on the collection of spaces with smoothness 7 > 1/2, whereas in 
the case (D) the minimax rate is achieved only when 7 = /?. 
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3.3 Bernstein Von Mises under these models 

In this section, we apply Theorem 12. II of Section [2.21 to estabhsh the following Bernstein Von 
Mises-type result. For this purpose, let us expand the function ■0c defined in ()2.5|) on the 
basis {(px)xeN- 

V'c = ^ '>Pc,x4>\- 
AeN 

We denote Hj^ ^ the projection operator on the vector space generated by {(px)o<\<k for the 
scalar product < f,g >= FQ{fg) and = ipc — ^fo,ktpc- So we can write for any x E [0, 1], 

k 

^fo,kA{x) = ipn,c,o + ^i^n,c,\4'\{x), 

A=l 

since (po{x) = 1- We denote Bn^k the renormalized sequence of coefficients that appear in the 
above sum: 

^n,k — W,c,[fc] — Vrn,c,Ajl<A<fe- 



/n 

Such quantities will play a key role in the sequel. Let Iq > he large enough so that 



L{n) 



for some positive c > 0, where e„ is the posterior concentration rate defined in Theorem 13.11 
and define /„ = l^ne^/ L{n). In the case (D) we set In = k^- In the following, in the case (D), 
whenever a statement concerns k < it is to be understood as k = In- 
We have the following result. 

Theorem 3.2. Let us assume that the prior is defined as in Definition \3.1\ and for all i G M, 

1 < k < In (or kn in case (D)), assume that 

^fc(^) 1 , (^\ -f \^ia a \2^(fog^)^ /o r:\ 

— - — -- — - = 1 + o 1), if yiOj - 9oj) < en 3.5 

7rk{9 - tBn,k) jr^ L{n) 

uniformly over {9; \\9 — 9q\\2 < 3(logn)^e„}. Assume also that 

r- 2 ] ■ (^-6) 

Under assumptions of Theorem \3.1\ 
• for all z £R 

¥^[V^{^{f)-^{Pn))<z\X^] = Y.p{k\X^)<^VoAz + fin,k) + ord'^), 

k 

(3.7) 

where 
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(3.8) 



• In the case (D), ifj>P, 

F^[V^{^{f)-^iPn))<z\X^] = $yjz) + opo(l), 
where Vq = Fo(V'c)- 

The Bernstem-Von Mises property obtained in the case (D) is deduced by proving relation 
(13. 6p in this case. Indeed if 7 > /? there exists a > such that \/nef^ < n~", besides 

||^'0cj<Ajl|oo < 1 + II V'cj'/'jiloo 
j>k j^k 

< C log n 

and Ylj>k+i'^cj — 0(1/^)) so that relation (13. 6|) is satisfied. Apart from this argument the 



proof of Theorem 13.21 is given in Section 14. 3[ The first part of Theorem 13.21 shows that 
the posterior distribution of •yn(^'(/) — is asymptotically a mixture of Gaussian 

distributions with variances Vq — Fo(A^) and mean values ^n,k with weight p(fc|X'"). To 
obtain an asymptotic Gaussian distribution with mean zero and variance Vq it is necessary for 
fin.k to be small whenever p{k\X"') is not. The conditions given in the second part of Theorem 
13.21 ensure that this is the case, however they are not necessary conditions. Nevertheless, in 
Section 13.41 we give a counter-example for which the Bernstein- Von Mises property is not 
satisfied in the cases (PH) and (D) with 7 < /3. 

We now discuss condition (13. 5p in three different examples. Note first that An C {0; \\9 — 
Bob < 3(logn)2e„} with 9 G Bfc, k < /„. 

• Gaussian: If g is Gaussian then for all k < In (or fc* in the case of a type (D) prior) and 
ah j < k, Oj ~ AA(0, Tq^j-^^) and for all OeAnDTk 



< 



Ck^f 



n 



E 



< 0{n^^-ht') = 0(1) 

,■2/3 



'oj-)V'c,j^^ + E-=iS-V'c,-j 



2/3 



n 

< — 

= o{l). 
This implies that uniformly over 

TTk{0 - Bn,k) 

• Laplace: If g is Laplace, g{x) oc e"'^'. 



/n 



9o||A:2'3 + (fc2/3-7 + i) 



log g 



Oj - til,, 



7rfc(0)(l + o(l)) 



log g 



< C 



n 



So that 



log 



vrfc(^) 



< C 



E-=i/l^, 



n 



In 
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for all 7 > 1/2, 1 > f3 > 1/2 in the cases (D) and (PH), and condition ()3.5p is satisfied. 

• Student: In the Student case for g we can use the calculations made in the Gaussian 
case since 

k 

log (i + cff'e]) - log (i + cf'ie, - t^cj/V^f) 

= 0(1) 

Therefore in all these cases condition (j3.5p is satisfied. 

Interestingly Theorem 13 . 2 1 shows that parametric sieve models (increasing sequence of models) 
have a behaviour which is a mix between parametric and nonparametric models. Indeed if the 
posterior distribution puts most of its mass on fc's large enough the posterior distribution has 
a Bernstein Von Mises property centered on the empirical (nonparametric MLE) estimator 
with the correct variance whereas if it allows for /c's that are not large enough (corresponding 
to T/j=if^[{&3 - t^Pcj/Vnf - 0]] or not smah enough) then the posterior distribution is 
not asymptotically Gaussian with the right centering, nor with the right variance. An extreme 
case corresponds to the situation where Fo(A^) / o(l) under the posterior distribution, which 
is equivalent to 

3ko, s.t. Ve > liminf„^ooPo l^'^ [^ol^"] > e] > 0. 

For each k > fixed, if infggjgfe K{fQ, fg) > 0, since the model is regular, there exists c > such 
that P[f [P'' > 6-"^=] 1. Therefore, i^o(A^) / o(l) under the posterior distribution if 

there exists /cq such that inig^^ko i^(/o, fe) > 0, i.e. if there exists On G M'^" such that /o = fog. 
In that case it can be proved that P'^[A;o|-^"] = 1 + op{l), see (j^), and the Bernstein Von 
Mises theorem to be expected is the parametric one, under the model Q^o which is regular. 
However, even if = op(l), the posterior distribution might not satisfy the non parametric 
Bernstein Von Mises property with the correct centering. We illustrate in the following section 
this issue in the special case of the cumulative distribution function. 

3.4 An example: the cumulative distribution function 

As a special case, consider the functional on / to be the cumulative distribution function 
calculated at a given point xq. As seen in Section[2l ipc{x) = llx<xo~Foixo)- We have F„(xo) = 
Pni'4') and recall that the variance of Gn{tp) under Pq is equal to Vq = Fq{xo){1 — Fq(xq)). 

As an illustration, consider the case of the Fourier basis. The case of wavelet bases is 
dealt with in the same way. In other words for A > 1, (f)2X-i{x) = -v/2 sin(27rAx), 4'2x{x) = 
\/2 cos(27rAx) and 4'o{x) = 1. 

Corollary 3.1. // the prior density g on the coefficients is Gaussian or Laplace then if 
fo G S^, with 7 > /3 and if the prior on k is the Dirac mass on A:* then the posterior 
distribution of y/n{F{xQ) — Fn{xQ)) is asymptotically Gaussian with mean and variance Vq. 
If the prior density g is Student and if 'j > P > 1, then the same result remains valid. 
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This result is a direct application of Theorem 13.21 

Counter-example: In this remark we illustrate the fact that in the case of a random k, 
which leads to an adaptive minimax rate of convergence for the posterior distribution we 
might not have a Bernstein - Von Mises theorem. Consider a density /o in the form 



/o = exp ^ eoj(l)jiu)du - c(6'o) 
\j>ko 

where ko is fixed but can be large and 0o,2j = and 

6*0,2^-1 = sin(27rja;)/[j'^+^/ Vlog j log log j] . 



Then for Ji > 3 



1 



j>ji 



j>ji 

fOO 



< 



J log J log logj2 
1 

X log x(log log x)2 

1 



dx 



log log Ji 



and similarly 



E 

j>ji 



< 



< 



E 

j>Ji 



1 



j27+llogjlog logj2 
1 

'j^ a;27+l jQg x(log log x)2 

1 

2^x'^"' log x(log log x)^ 



(ix 



Ji 



1 



:(l + o(l)) 



(3.9) 



27Ji2nogJi(loglogJi)2 
when Ji ^ cxD. 

Consider a Poisson distribution on k with parameter v > fixed then for such /o, if 
A;„ = ni/(27+i)(iogn)-2/(27+i)(ioglogn)-2/(27+i) and ki is large enough 

F^'lk < fcifc„|X"] = 1 + o(l). 

We now study the mean terms and we show that if /c < kik^, ^n,k 7^ o(l) nor can 

X") be neglected. 
First note that when — > cxo G„(A^) = o(l) 

/^n,fc = \/nFo[{'4)c-Tif^^kilJc){lo-^h,klo)] 



( E V'cj0i)(^o - n/o,fcZo) 



n y [(v-c - n/(,^fc''/'c)(^o - n/„,fc/o)] 
+^/^ j ifo - 1) [(^c - n/„,fcVc)(Zo - n/o,fc/o)] 



(3.10) 
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We first consider the first term of the right hand side of (j3.10p . 



IJ"n,k,l 



j=k+l 



j=k+l 

E 



sin^(27rx/) 



l>k/2 



{21 + 1)7+3/2 log(2/ + 1)1/2 loglog(2/ + 1) 



and if X = 1/4 we have 



1 

(4i + 3)7+3/2 (log 4j + 3)1/2 loglog(4j + 3) 

^-7-1/2 



Vlog A: log log k 

fc-7-1/2 



^log k log log A; 
Note that there exists c > such that for all k < kn 



n. 



We now consider the second term of p.lOp . Let Mi^/j denote the projection on 
with respect to the scalar product < f,g >2= / fg{u)du and note that 

oo 

n/o,fc/o = Mi,fc/o + ^h,k[ %(t>j] 

j=k+l 



\l^n,k,: 



n J ifo - 1) 
n [ifo- 1) 

n [ifo- 1) 



i=fc+i 



+ 



j=k+i 



( Y V'j0i)(A'/i,fc^o - n/o,fc/o) 

1/2 



oo \ / oo 



< 2|/o - l|oo 5^ V'cj 

\i=fc+l 

/^-7-l/2 

< C^|/o-l| 



"v/log A; log log A; 
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By choosing ko large enough |/o — l|oo can be made as smah as need be so that we finally 
obtain that there exists c > such that for all k < kn 

A*n,fc > cylogn. 

Note that in case (D) with 7 < /3, the same calculations lead to 

fj-~i _ 1 _, 

Mn,fc* > cn */3+2 (log n) 2 (log log n) . 

Thus in this case the posterior distribution is not asymptotically Gaussian with mean Fn{x) 
and variance Fo{x){l — Fo{x))/n. Whether it is asymptotically equivalent to a mixture of 
Gaussians is not clear. It would be a consequence of the way the posterior distribution of k 
concentrates as n goes to infinity. In the case (D), the posterior distribution is asymptotically 
Gaussian with mean Fn{x) — fJ^n,k*- 

4 Proofs 

In this section we prove Theorems 12.11 \3A\ and [3^21 In the sequel, C denotes a generic positive 
constant whose value is of no importance. 

4.1 Proof of Theorem [m 

Let Zn = V^i'^if) - ^(-Pn)). We have 

P-{A„|X"} = l + op„(l). (4.1) 

So, it is enough to prove that conditionally on A„ and X", the distribution of Zn converges 
to the distribution of a Gaussian variable whose variance is -Fo(V'c)- This will be established 
if for any i G M, 

^hrn^ Kit) = exp (^-Fq [ij^] ^ , (4.2) 
where Ln{t) is the Laplace transform of Zn conditionally on and X": 

Ln{t) = [exp(iV^(^(/)-^(P„,)))|A„,X'^] 

[exp(tV^(^(/) - ^(P„)))nA„(/)|^"] 

J^^e^v{Uf)-Uk))d^{f) 

We set for any x. 



so. 
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which imphes that 



fix) - Mx) = Mx) ( ^ + 



n n 



and 



\ n 



Since 



we have 



Inif) - Inik) = -^^^ + Gn{h) + Rn{h), 



J^^ exp (-Mi!) + + dirif) 



where straightforward computations show that 

^2 _ _ ^ 

Un,h = tFo{h{i)^ - i;t,n)) + -l^Foii^L) + Rn{h) - Rn{h " tV^t.n) + -^Fo{h^ Bh,nA) 

2 ' V" 

t 



tFo{hiPc)+tV^Fo{tl;t,n) + -^Fo{h'^Bh,nA) 



exp 



= tFo(/iVc) +nlog (^Fo 
Now, let us study each term of the last expression. We have 



+ -^Fo {h^Bh,n^c) 



exp 



n Jn 



= Fn 



+ 0(n 2) 



/n 



H Fo 

2n " 



+ 0(n" 



So, 



n 



n 



, Fo[/lVc'] ^ Fo[/l2i?^,„Vc'] 



n 



n 



Note that, on An, we have FQ{h?) = O(nn^) and Fq {h?B}^^n) = o('^)- Therefore, uniformly 
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on An, 



exp 



h tipc 



n \ n 



n 



n 



2. , Fo[/lVc'] , Fo[h^Bh,ni^, 



+ 



n 



n n 

■ Fo[h^Bh,ni^,] tFo(V;g) 
Foihtpcl H T= 7: ho(l) 



) +o(n-i) 



n 



1 + o n 



,-1/2 



and 

log ^Fo 
Finally, 



exp 



h tipc 



n \ n 



FoihtPc) + 



Fo[h^Bh,ni^,] tF^i^l 



n 



+ o(l). 



Un,h = -^Fo [V','] + 0(1) 



and up to a multiplicative factor equal to 1 + o(l), 



Ln{t) = exp ( —Fq 



/a„ exp 



Fo((fe-#t.„)2) 



f^^^ exp (-^ + + Rn{h) ) dvr(/) 



Finally (A3) implies (j4.2p and the theorem is proved. 



4.2 Proof of Theorem SU 

We first give a preliminary lemma which will be used extensively in the sequel. 

4.2.1 Preliminary lemma 

Let us first state the following lemma. 

Lemma 4.1. Set Kn = {1, 2, . . . , with /c„ E N*. Assume either of the following two cases: 

- ^>0,p = q = 2 when <P is the Fourier basis 

- < J < r, 2 < p < 00, 1 < q < 00 when (p is the wavelet basis with r vanishing 
moments. 

Then the following results hold. 

- There exists a constant ci^$ depending only on <P such that for any 9 = (^a)a 6 M*^", 



< Cl,<l>\/kn\0\i^. 



(4.4) 
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- //log(/o) G Bp^q{R), then there exists 02,7 depending on 7 only such that 



(4.5) 



/f log(/o) G Bp^q{R) with 7 > i/ien i/iere exists 03^$^^ depending on <P and 7 on/y suc/i 



— 'y -R /Cji 



(4.6) 



Proof. Let us first consider the Fourier basis. We have: 



E 



< 



loo i^aI, 



which proves (14. 4p . Inequahty (14. Sh follows from the definition oi = To prove ()4.6p . 
we use the following inequality: for any x, 



< 



< 



Woo Y I^OaI 

,,00 , E lAp^e, 

I. A^A'„ 



OA 



Now, we consider the wavelet basis. Without loss of generality, we assume that log2(A;n + l) G 
N*. We have for any x, 



J2 ^a</'a(x) 



X£Kn 



,xeK„ 



,xeKr 



< 



2^-1 

11^2 1 E E •^fc(^) 

^0<i<log2(A:„) fc=0 



Since ^{x) = for x ^ [-^4, A], 

card {k G {0, . . . , 2^' - 1} : I'jkix) / 0} < 3{2A + 1). 



see (|l5l ). p. 282 or (|16| ), p. 112). So, there exists op depending only on ^ such that 



E ^a0a(x) 


< ll^ll^. 1 


Aeii-n 





E 3(2^ + 1)2^4 

^0<i<log2{fc„) 
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which proves (j4.4p . For the second point, we just use the inclusion Bp^q{R) C B2^{R) and 

2^-1 



'oA 



j>log2(fc„) fc = i>log2(fcn) 



1.-27 



Finally, for the last point, we have for any x: 



'2^-1 



< 



E E 

i>log2{fc„) \ fc=0 



k=0 



where C < R{3{2A + l))^c^(l - 2^-^)^i. 



4.2.2 Proof of Theorem [37T] 

Denote for any n, 

5n(e„,) = {/ G ^ : i^(/o, /) < 4, Vifo, f) < el}, 

To prove Theorem 13. H we use the following version of the theorem on posterior convergence 
rates. Its proof is not given, but it is a slight modification of Theorem 2.4 of (0). 

Theorem 4.1. Let /o be the true density. We assume that there exists a constant c such that 
for any n, there exists C T and a prior n on T satisfying the following conditions: 

-(A) 

- (B) For any j gW, let 

Sn,j = {feK- jen < Hfo, f) < {j + l)e„}, 

and Hnj the Hellinger metric entropy of Snj- There exists Jo,n (that may depend on 
n) such that for all j > Jo,n; 



2,2 



Hnj <{K- l)nfe. 
where K is an absolute constant. 
- (C) Let 

Bn{en) = {/ G ^ : i^(/o, /) < 4, ^(/o, /) < 4}- 

Then, 
We have: 

ff^" {/ : /i(/o, /) < Jo,nen|X"} = 1 + Op(l) 
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To prove Theorem 13.11 it is thus enough to prove that conditions (A), (B) and (C) of 
the previous result are satisfied. We consider (A„)„ the increasing sequence of subsets of N* 
defined by A„ = {1, 2, . . . , with Z„ G N*. For any n, we set: 



fe e : /e = exp ^ 0x(f)x - c{9) , \e\i^ <Wn \ , 



, AeA„ 



with 



Wn = e-x.'p{wQn^{[ognY)^ p>0 

Recall that 

_ 7 7 

- e„, = eon 27+1 (log n) 27+1 in case (PH) 

p 

- fin. = ^o'lT' in case (D). 
Define by 

_ Ipnel 
L(n) 

where is some positive constant. When 7, /3 > |, we have 

/ ^ n 

Proof of condition (A): We have, since X^/jTfc < 00 



< CexpH„L(/„))+ ^P^l^ > 

< Cexp {-lonel) + J] (exp > exp (^) 



< C exp (-/one„) + C/„ exp ^ 

< C exp (-Zo"'^) + C exp (-n^) 
for any positive H > 0. Hence, 

7T{J^:;}<Ce^p{-{lo-l)nel) 

and Condition (A) is proved. 



(4.7) 



(4.8) 
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Proof of condition (B): We apply Lemma 14.11 with Kn = A„ and kn = In- For this 
purpose, we show that the Hehinger distance between two functions of J-'* is related to the 
^2-distance of the associated coefficients. So, let us consider fg and Jq' belonging to J^* with 



fe = exp j ^ 9x(l)x 
. AeA„ 



0(9) , = exp Yl ^A'/'A - 0(9') I . 



Let us assume that \\9' — 9\\i-^ < ci€nln with ci a positive constant, then using ()4.4p and 



AeA„ 



< Cx/lnW - 9\\i, < C^/lnW - 9\U, < Ccien ^ 



and 









\ci9)-c{9')\ = 


log 


/e(x)exp ^ 








< 


log 








^ AgA„ 



AeA„ 



< 



C\\ E (^A - ^a)<Aa||oo. 

AeA„ 



Then, 

h\fe,fe') 



= I fe{x) l^exp E (^A - Ox)Mx) + \ {c{9) - c(0')) j " 1 j ^^a: 
< l^exp (^\\ E K - - 



AGAn 

< Cln\\9- 



yii2 



/||2 



(4. 



The next lemma establishes a converse inequality. 

Lemma 4.2. There exists a constant c < 1/2 depending on 7, and ^ suc/i i/iai i/ 

(j + l)2e2/^ < c X min (cq, (1 - e-^)^) 

i/ien for fe G S^j, 



ll^o-^il, <— (logn)^/i^(/o,/e). 

CqC 



Proof. Using Theorem 5 of ([231), with Mi = f ^Qdx) ' , if 



/i'(/o,/^?)< 2(1-^-')', 
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we have 

VifoJe) < 5h\fo,fe){\logM,\-\ogihifo,fe)f. (4.10) 

But 

Ml = /o(x) exp I (^OA - Ox)M^) + E ^oM^) " c(^o) + c{9) j dx 

° \AeA„ A^A„ / 

< fo{x) exp (^CiVz^i^o - e\U, + mr^] - ciOo) + cie)^ dx, 
by using (j4.4p and (j4.6p . Furthermore, 

\c{9o)-c{e)\ < C[y^\\eo-9k, + Rir\ (4.11) 



log Ml I < C[y^\\eo - 9\\e, + R li ^] 



1 X 2 



So, 

Finahy, since fg £ Snj for j >1, 

Vifo, fg) < 5h\fo, fg) (C[yT^\\eo - 9\U, + R it^] - log(e„) 
< Ch^foJg) {lJ9o - e\\l + (logn)2) . 
Since fo{x) > cq for any x and Jq (j)\{x)dx = for any A G A, we have 

F(/o,/e) > coi^o-^ii. (4.12) 
Combining (14. 9p and (14.120 . we conclude that 

\\eo-e\l < c(iogn)V(/o,/,), 

if h^ifo, fe)ln < {j + l)^elln < 1/(2C). Lemma lU is proved by taking c = (max(C, l))^^/2. 
■ 

Now, under assumptions of Lemma 14.21 using ()4.9p . we obtain 

Hn,, < log ((CZ„(j + l)logn)'") < /„log (Ce^iyi^logn) . 

Then, since lnL{n) = lone"^, we have 

i^n,j<(i^-l)ni2e2 

as soon as 

j2 ^ Jo fog n 
L(n) ' 

where jo is a constant and condition (B) is satisfied for such j's. Now, let j be such that 

^2,2 r ^ ^ Co 1 , „-1n2 



c(j + l)^6^/„>min(^,-(l-e-^)^). (4.13) 
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In this case, since for fg G !F*, 

< \/^ll^lk2 — V^Wn, 

for n large enough, 

Hn,j < log [{ClnWne-^y-) < 2/„ log(zz;„) < 2u;o/„n^(logn)''. 
Then, using (j4.13p . condition (B) is satisfied if wq and q are small enough and if 

which is true for n large enough, since 7,/3 > ^, for p small enough. 

Proof of condition (C) Let /c„ G N, going to oo and Kn = {1, kn}, we assume that 9 
belongs to A{un) where 



A{Un) = < 
where Un goes to such that 



0: = for every X ^ and ^ (^oA - ^a)' <ul\, (4.14) 

xeK„ 



y/knUn^O. (4.15) 

We define for any A G A, 

Pxifo) = [ 4>\ix)fo{x)dx. 
Let us introduce the following notations: 







foK^ = exp Ooxcpxix) - c{eoKj , foK„ = exp ^ 6'oa<^a(2;) - cie^K^ 

\Ae-fC„ / \A^ii-„ 

We have 

K{fo, foKj = J2 ^oa/9a(/o) + c{9okJ - c{9o) 



OoxPxUo) + log Ux)e- 



X^Kr, 

Using inequality ()4.6p of Lemma 14.11 and a Taylor expansion of the function we obtain 

-I 







1- J2 ^oa/?a(/o) + ^ f fo{x) ( Yl ' 

X^K„ ■'^ \xiK,, 



2 



^QX<t>x{x) dx X (1 + o(1)). 
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We have 



E ^oa/?a(/o) 



< 



E 



/0II2 I 7 , l^OA 



and 



r/o(x)( 



So, 



log ( r /o(x)e-SA^^n^oA</'AW^3.^ = _ ^ 0oa/3a(/o) ' I i ^oa/3a(/o) 



So, finally, 



I f foi^) ( E ^oa</'a(x) ) dx-U^, ^oa/3a(/o) ) +0 ( E 



This implies that for n large enough, 

KifoJoKj < ll/olloo E ^OA < Dk-'''- 

Now, if fg G J^kn with fg = exp (I^Aei^^ ^a-^a - c{6)) , we have 

K{fo, fg) = K{fo, foKj + E (^0^ - ^a)/3a(/o) - c{9okJ + c{e) 
< Dk-^-^ + ^ - 0a)/3a(/o) - c(0oi^J + c{e). 

A6K„ 



We set for any x. 

Using gSD, 
So, 



r(x) = ^ (0a-^oa)</'a(x). 
Aeic„ 



\\T\\^<C^knUn^Q. 



1 |.l 

foKr,{x)exp{T{x))dx = 1 + foKn{x)T{x)dx + foKr,{x)T'^{x)v{n,x)dx, 

JO JO 
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where is a bounded function. Since log(l + u) < u for any u > —1, for 6 £ A{un) and n 
large enough, 



cieoKj + cm 



= log(^'/ox„(x)e^(-')dx' 

< / foK,Ax)T{x)dx + / foKjx)T'^ix)v{n,x)dx 
Jo Jo 



xeKr 



So, 



K{fo, fe) < Dk-^^ + (^OA - ^a) (/?a(/o) - (^xUokJ) 



XeKn 



< Dk-^"' + UnWfo - f0Kj2 



Using ()4.6p . we have 





r / 


s ii/oIil/I 


1 — exp 







(x) - c{9okJ + ci9o) dx. 



and 



Finally, 



and 



\c{9oK„) - c{eo)\ < II ^ fi'oAV^Alloo- 

ll/o - foKjh < D\\ J2 (^oxMo. < Dkl 

X<^K„ 

K{hJe)<Dk-^^ + Dunkr\ 



(4.16) 



We now bound F(/o, fe)- For this purpose, we refine the control of |c(0oK„) — c{9q)\: 
|c(0oi^J-c(eo)| = 



log / /o(x)exp - V 9Qx(t>x{x) dx 
log [ fo{x) (l- 9ox4>x{^)+Hn,x) I ^ ^oa</'a(x) ) ) 



dx 



where w is a bounded function. So, 



\c{9okJ - c{9o)\ < dIy. I^oa/3a(/o)| + / ( E 



a(x) dx 



< ^ I E ^OA I < Dk-^- 



27 



In addition, 

AGA'„ 

< Un (ll/o - /oiTnlb + II/0II2) + DknU^n 

< DUn + DknU^n 

Finally, 

V{hJe) < ul + Dk-^^ + Dknul 
Now, let us consider the case (PH). We take kn and Un such that 

K^'^ < ^oCn and u„ = uoenk„ 



' 2 



(4.17) 



(4.18) 



where ko and uq are constants depending on f/oHoo, 7, R and If /cq and lio are small enough, 
then, by using ^T7i6\\ and (liTH) . 

K{foJe)<el and F(/o,/<9)<4. 
So, Condition (C) is satisfied if 

P^{^(n„)} > e"^"^", 
where, ^(ttn) is defined in ()4.14p . We have: 

{A{un)} > F^h: (0A - Ooxf <ul \ X exp (-ciA:„L(A;„)} 

[ xeKn J 

The prior on 9 implies that 
Pi = P 

> P 



J^A — t/QA 



AeA' 



AG/C„ 



< Tn 'tin 



> 



1 -1 n 9{x\)dxx 
1 ^ n (yA + ^(r"A'30oAVyA. 

AG -ft:,, 



Using (j4.5p . when 7 > /?, we have sup;^^^^ 



1 



To ^A/^^OA 



< 00 and since 



sup <! Tq ^ n„ > < 00 



(4.19) 
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using assumptions on the prior, there exists a constant such that 



Pi > L>? 







n dy> 



xeKr 



> exp (-L'4A:„logn) , 



(4.20) 



where is a constant. When 7 < /3, since there exists a,b > such that V|y| < M for some 
positive M 

g{y + u) > aexp(— 6|n|^ ) 



using the above calculations we obtain if < 2 

Pi > Dl-e^^{-CY,X^*^W]Y,j ...jl 



exp {—Dikn log n) 



> exp [-Cfc^P*/2+/3-7 

> exp(-(Z)4 + l)fc„logn) if/3<l/2 + p72 
and if t > 2 



n dyx 



Pi > Z)^"exp{-C ^ AP*^|0OAr*}exp(-P)4A:„logn) 

> exp (- {D4 + 1)A;„ log n) if /? < 1/2 + 1/p* 

So, Condition (C) is established as soon as D4knlogn < cne^. Using (I4.18p . this can be 
satisfied if and only if we take kn such that 



k '^'< f < k < 



D4 log n 

which is possible if and only if eo is large enough. In particular, this implies that 

log 71 \ 27+1 



(4.21) 



sup < e„ 



n 



< 00. 



Note that when kn satisfies (j4.2ip . Conditions (j4.15p and (j4.19p are satisfied as well. 
Similar computations show the result for the case (D). 



4.3 Proof of Theorem [SH 

Our goal is to prove conditions (Al), (A2) and (A3) of Section [2^2] to apply Theorem l2.1[ Let 
en be the posterior concentration rate as obtained in Theorem 13. II 

Let us consider f = fe & ^or 1 < k < In, where In = lone^/ L{n) in the case of type 
(PH) priors and In = kn in the case of type (D) priors. First, using the same upper bound as 
in the proof of Lemma 14.21 we have 

VUoJ)<2C{\ogn)hl, (4.22) 
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as soon as /i(/o, /) < Cn- Thus, using (j3.4p . we have 

F^AijX^^} = 1 + 0^,(1) 

with = iio(logn)^e^, for a constant large enough. Note that we can restrict ourselves 
to Al^ n {Uk<i„Tk), since P'' [{Uk<i„TkY] < e"''"'" for any c > by choosing /q large enough, 
see the proof of Theorem 13.11 

To establish (A2), we observe that 

II log/e - log/oioo < II Yl (^OA - ^a)</'a||oo + \c{9) - c(0o)| 

AeN* 



< C Jin 



%\t2 + 



0(1), 



by using Lemma l4.ll and (j4.1ip . So, (A2) is implied by (Al). Now, let us establish (A3). 
Denote the set defined in assumption (A2) and restricted to {Uk<i„J'k)- For any t, we 
study the term 



h„ exp 



+ Gn{hf - t^t,n) + Rn{hf " tV't,n) dvr(/) 



/a„ exp 



+ G„(/i/)+i?„(/i/) d7r(/) 



Ei<fc<«„P(^)X4„n^,exp 



Fo{{hf-tipt,v.f) 



+ G„(/l/ - tVt.n) + -Rn(V - iV't,n) ) dTTk{f) 



El<fc<«„P(^) X4„n.F. <5^P 2^ + <^n(/l/) + Rnihf) d^k{f) 



If we set 



n 



^ A=l 



we have using (j4.4p and since k < In'. 

\\bn,k,tlQO 



tVk 

< lin/o,fcV^c - V'n,c.o||/o 



< 



/co^/n 
2,t\/Tfi 11 



co^/n 



OiCn). 



for Co a constant. Recall that for fg & J^k, 



hg = ^/ni ^ ( 

Vagn* 



7A - Oox)(l)x - c{e) + c(^o) and B^^k 



V'n,c,[A;] 



so, for 6' = 6 - tBn^k, with iJ„ = {hg - t'il)c)/^/n and = ipc - '^fo,kipc 

hg' = hg - \/nbn,k,t + Vn{ciO) - c{9 - tBn,k)) 

Fo(e'^"+*'^'*/v^ 

hg - tilJt,n + Kil^c - Yifo.k'^c) - Vn\o^ 
hg - ttpt^n + tA^ - An, 
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with 



A/nlog 



Now, M.llj) implies jhe 



< Vken = o(l) and since ^(A^) = 0(1), ||Av,||oo = 0(^1^) 



Foie 



,2J|A^llc 
'jJ' 



3/2 



Also, for any function v satisfying i^od-yj) < oo 



Note that in the case v = 1 since Fq^c'^^^^') = 1 we can be more precise and obtain 



1 



tFo{hei^c) 



n 



In n 



1 + 



Moreover 



Fo(^;e'^«/v^) =Fo(t;)+o(Fo(|t;|)). 
Therefore using (031) with v = A| leads to 

and using ()4.23p with f = A^ together with (I4.24p and using ()4.25p 

f2 



(4.24) 



(4.25) 



^Fo(e^"A^) = * Fo(A^e'^«/v^)--Fo(A^Vc) + of-y 



Also 



Fo (/igA^) + — Fo (/i^5/,,,„A^) 



where Bh,n is defined by ([O]). Since Fo(e''»/v^^c) = Fo(^c) + o(l) = o(l), we thus obtain 
using the fact that Fo(e^") = 1 + o (^-^ and Fo(?/'cA^) = Fo(Ap 



Fo(e^^") 



1 + ^Fo ( e'^^/v^A 



n 



2n 



Fo{Al)+. 
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and finally, 



An = ^/nlog 



Fo [hlBhg,nA^) t 



n 



+ o{n 



-1/5 



Moreover 



and by using (|4.22p . 



77, 



< C||A,/,||ooV^(log77) 



To bound ||A^|| oo, we write 

where ipj^^ is a linear function of the (f)fs for j > A; + 1. Then by using (|4.4|) . 

||A^||oo < i'i/'+fcioo + ||n/(,,fc7/;+fc||oo 

< ll'i/'+fclloo + C\/A;||7/;+fc||/o 



< 



+ 



2- 



Under the assumption that 



SupdlV'+fcioo + VkWtlj+kh) = o ■ "2 ; , 

fc<in V V'^e^ (logr7j 



1 



we obtain that 



A. 



Fo(/7,A^)--Fo(A2) 



Note that A„ = o(l). Finally, 



(4.26) 



i?n(/ie') = ^/^Fo{hB') + ^^^^ 

= Rn{hg - ti)t,n) " \/^A„ - -Fo(A^) + tFo{heA^,) - AMhe) + o(l) 
= i?„(/7e-t7/;t,„)-A„Fo(/70) + o(l) 
Recall that hg' = hg — t'ipt,n + ^A^ — A„, A„ = o(l) and Fo(A^) = 0. Note also that 



M^) = Mx) + ^ log (i^o(e^")) = M^) + o(l) 
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so that Fo(A^^f,„) = Fo(A2 ) + o(l) and 

F^ihl) _ Fo{{he - t^Pt,n?) Fo{{tA^-Anf) 



2 2 



Fo((/ie -#t,n)(iA^ - A„)) 



Fo{{he-t^Pt,n?) , i'i^o(A: 



2 • 



Furthermore, 
We set 

and we finally obtain, 



+ ^-^^ - tFoihgA^,) + AnFoihe) + o{l) 

Gn{he') = Gn{he - t-ipt^n) + tG„(A^). 
Mn,fc = —Fo{hgA^) + Gn{A^,) 



2 

i^o((/te-tVif,n)^) 
2 



+ i?„(/l6) - tVt,n) + Gnihe - t'lpt,n) + tHn,k 



Note that by orthogonality Fo{h0A^,) = ^/nFo[{■^^Jc -^fo,ki^c) Y^j>k+i ^Oj'/'i] so that /Hn,^ does 
not depend on 9 and setting T^O = 9 — tBn^k for all 9, we can write 

J^^^^^ exp (^_ ^o((fe/-i^t,„)^) ^ ^^(^^ _ ^ ^^^^^ _ ^^^^^^^ 



h^nn ^""P (^-^^^^ + Gn{hf) + i?„(/i/)^ d7rfc(/) 



= e 2 e (l + o„(l)), 

Ie,nK e-^^''"^'''^^''"^'"'^d7rk{9) 

where A'^^ = {& ■ fe ^ An}. Moreover, for k < In, \\Bn,k\\2 < G/^/n, where G depends on cq 
and llV'cioo- So, if we set 

TkiA'n) = {9€ Gfc nA'n-. 9 + tBn,k G A'J 

for all 9 G n{A'n), 

\\9 - 9o\\l < 2{logn)'el + — < 2e2(logn)^(l + 
since ne^ +oo. For all G 6^ n such that \\9 - 9o\\2 < M^l!^ 

for n large enough and we can write 
A'n,i = [0€A'n: \\9 - 9ok, < fe|)!£n | ^ = {0 G < : - ^olb < mogn)hn} 
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then 



QknA'cniA'j c efcn<.2 



(4.27) 



and under assumption (j3.5|) . 



(l+On(l)), 



(l + o.(l)). 



Therefore, 



Cn{t) := E[exp(tV^(V^(/)-V'(P„)))llA„(/)|X"] 
^pik\X^)Jk 



e 2 



.fc=i 



< 



(l + On(l)) 



^'0('/'c2)-J^0(^3,) 



,fc=l 



(l+On(l)) 



and 



k=l 

Besides under the above conditions on the prior, with probabihty converging to 1, 

for some positive constant c > 0. Then uniformly over k such that 0^ Pi A'^ i 7^ 

n[{A'^^,r\X\k] e-*'^".'= = 0(1) 

and 



Ut) > e*'^ j;p(A;|X")ne,nA„^0e-*'^-^e-*'^(l + o„(l)) 



In 



k=l 



This proves that the posterior distribution of ■y/n(^'(/) — ^'(P„)) is asymptoticaUy equal to a 
mixture of Gaussian distributions with variances Vofc = -Fo(V'c^) — -^o('^^)! means —^n,k and 
weights p(A;|X"). 

Now if ||A^|| = o(l) (A; — > +oo) G.„(A^,) = op(l) and with probability converging to 1, 



+ 00 



1/2 / \ 1/2 



|/^n,.| < i/oiooV^I <j] E ^0,1 +On(l). 

\j=k+l I \j=k+l 
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Thus '\lk = kl, 

= o{V^{K)-<'^/^)+On{l)=On{l) 

and Equality (13. 8p is proved. 
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